This notebook demonstrates how to use the cell2location model for mapping a single cell reference cell types onto a spatial transcriptomic dataset. Here, we use a 10X single nucleus RNA-sequencing (snRNAseq) and Visium spatial transcriptomic data generated from adjacent tissue sections of the mouse brain (Kleshchevnikov et al., BioRxiv 2020).
The first step of our model (#2 in Fig 1, tutorial 1/3) is to estimate reference cell type signatures from scRNA-seq profiles, for example as obtained using conventional clustering to identify cell types and subpopulations followed by estimation of average cluster gene expression profiles (Suppl. Methods, Section 2, Fig S1). Cell2location implements this estimation step based on Negative Binomial regression, which allows to robustly combine data across technologies and batches (Suppl. Methods, Section 2).

Figure 1. Overview of the spatial mapping approach and the workflow which are enabled by cell2location. From left to right: Single-cell RNA-seq and spatial transcriptomics profiles are generated from the same tissue (1). Cell2location takes reference cell type signatures derived from scRNA-seq and spatial transcriptomics data as input (2, 3). The model then decomposes spatially resolved multi-cell RNA counts matrices into the reference signatures, thereby establishing a spatial mapping of cell types (4).
In the second step covered by this notebook (#4 in Fig 1), cell2location decomposes mRNA counts in spatial transcriptomic data using these reference signatures, thereby estimating the relative and absolute abundance of each cell type at each spatial location (Suppl. Methods, Section 1, Fig S1).
The cell2location workflow consists of three sections:
I. Estimating reference expression signatures of cell types (1/3)
II. Spatially mapping cell types (2/3, this notebook):
III. Results and downstream analysis (3/3)
First, we need to load the relevant packages and tell cell2location to use the GPU. cell2location is written in pymc3 language for probabilistic modelling that uses a deep learning library called theano for heavy computations. While the package works on both GPU and CPU, using the GPU significantly shortens the computation time for 10X Visium datasets. Using the CPU is more feasible for smaller datasets with fewer spatial locations (e.g. Nanostring WTA technology).
import sys
#if branch is stable, will install via pypi, else will install from source
#branch = "pyro-cell2location"
#user = "vitkl"
#IN_COLAB = "google.colab" in sys.modules
#if IN_COLAB and branch == "stable":
# !pip install --quiet scvi-tools[tutorials]
#elif IN_COLAB and branch != "stable":
# !pip install --quiet --upgrade jsonschema
# !pip install --quiet git+https://github.com/$user/scvi-tools@$branch#egg=scvi-tools[tutorials]
#import sys
#if not IN_COLAB:
sys.path.insert(1, '/nfs/team205/vk7/sanger_projects/BayraktarLab/scvi-tools/')
import scanpy as sc
import anndata
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
#import cell2location
import scvi
from matplotlib import rcParams
rcParams['pdf.fonttype'] = 42 # enables correct plotting of text
import seaborn as sns
Tips on initializing GPU
THEANO_FLAGS='force_device=True' forces the package to use GPU. Pay attention to error messages that might indicate theano failed to initalise the GPU. E.g. failure to use cuDNN will lead to significant slowdown.
Above you should see a message similar to this confirming that theano started using the GPU:
/lib/python3.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7.
warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 7605 on context None
Mapped name None to device cuda0: Tesla V100-SXM2-32GB (0000:89:00.0)
Do not forget to change device=cuda0 to your available GPU id. Use device=cuda / device=cuda0 if you have just one locally or if you are requesting one GPU via HPC cluster job. You can see availlable GPU by openning a terminal in jupyter and running nvidia-smi.
In this tutorial, we use a paired Visium and snRNAseq reference dataset of the mouse brain (i.e. generated from adjacent tissue sections). There are two biological replicates and several tissue sections from each brain, totalling 5 10X visium samples.
First, we need to download and unzip spatial data, as well as download estimated signatures of reference cell types, from our data portal:
# Set paths to data and results used through the document:
sp_data_folder = '/nfs/team205/vk7/sanger_projects/large_data/gut_kj_re/oxford_visium/'
sc_data_folder = '/nfs/team205/vk7/sanger_projects/large_data/gut_kj_re/'
results_folder = '/nfs/team205/vk7/sanger_projects/collaborations/fetal_gut_mapping/results/oxford/'
sc_results_folder = '/nfs/team205/vk7/sanger_projects/collaborations/fetal_gut_mapping/results/'
annotations_folder = '/nfs/team205/vk7/sanger_projects/collaborations/fetal_gut_mapping/results/tissue_annotation/oxford/'
scvi_run_name = f'{results_folder}scvi_stereoscope_selected_optimised_long/'
regression_model_output = 'amortised_signatures_lr0002_clip_norm10'
reg_path = f'{sc_results_folder}regression_model/{regression_model_output}/'
Now, let's read the spatial Visium data from the 10X Space Ranger output and examine several QC plots. Here, we load the our Visium mouse brain experiments (i.e. sections) and corresponding histology images into a single anndata object adata.
def read_and_qc(sample_name, path=sp_data_folder):
r""" This function reads the data for one 10X spatial experiment into the anndata object.
It also calculates QC metrics. Modify this function if required by your workflow.
:param sample_name: Name of the sample
:param path: path to data
"""
adata = sc.read_visium(path + str(sample_name),
count_file='filtered_feature_bc_matrix.h5', load_images=True)
adata.obs['sample'] = sample_name
adata.var['SYMBOL'] = adata.var_names
adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
adata.var_names = adata.var['ENSEMBL']
adata.var.drop(columns='ENSEMBL', inplace=True)
# Calculate QC metrics
sc.pp.calculate_qc_metrics(adata, inplace=True)
adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]
adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']
# add sample name to obs names
adata.obs["sample"] = [str(i) for i in adata.obs['sample']]
adata.obs_names = adata.obs["sample"] \
+ '_' + adata.obs_names
adata.obs.index.name = 'spot_id'
return adata
def select_slide(adata, s, s_col='sample'):
r""" This function selects the data for one slide from the spatial anndata object.
:param adata: Anndata object with multiple spatial experiments
:param s: name of selected experiment
:param s_col: column in adata.obs listing experiment name for each location
"""
slide = adata[adata.obs[s_col].isin([s]), :]
s_keys = list(slide.uns['spatial'].keys())
s_spatial = np.array(s_keys)[[s in k for k in s_keys]][0]
slide.uns['spatial'] = {s_spatial: slide.uns['spatial'][s_spatial]}
return slide
#######################
# Read the list of spatial experiments
sample_data = pd.DataFrame(['A1', 'A2'],
columns=['sample_name'])
# Read the data into anndata objects
slides = []
for i in sample_data['sample_name']:
slides.append(read_and_qc(i, path=sp_data_folder))
# Combine anndata objects together
adata = slides[0].concatenate(
slides[1:],
batch_key="sample",
uns_merge="unique",
batch_categories=sample_data['sample_name'],
index_unique=None
)
#######################
Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`. Variable names are not unique. To make them unique, call `.var_names_make_unique`.
# mitochondria-encoded (MT) genes should be removed for spatial mapping
adata.obsm['mt'] = adata[:, adata.var['mt'].values].X.toarray()
adata = adata[:, ~adata.var['mt'].values]
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
# read histology-based annotations:
adata.obs['Annotation'] = ''
for s in adata.obs['sample'].unique():
annot_ = pd.read_csv(f'{annotations_folder}{s}_annotation.csv', index_col='Barcode')
annot_.index = [f'{s}_{i}' for i in annot_.index]
adata.obs.loc[annot_.index, 'Annotation'] = annot_['Annotation']
adata.obs['Annotation'].value_counts(dropna=False)
Trying to set attribute `.obs` of view, copying.
Epithelial 1752 Other_tissue 1649 NaN 596 LowQ_tissue 527 Stem_cell_zone 396 Lymphoid_structure 45 Name: Annotation, dtype: int64
# select locations from good quality tissue:
tissue_ind = adata.obs['Annotation'].isin(['Stem_cell_zone', 'Epithelial',
'Other_tissue', 'Lymphoid_structure'])
print(sum(tissue_ind))
adata = adata[tissue_ind, :]
adata.obs['Annotation'].value_counts(dropna=False)
3842
Epithelial 1752 Other_tissue 1649 Stem_cell_zone 396 Lymphoid_structure 45 Name: Annotation, dtype: int64
Now let's look at QC: total number of counts and total number of genes per location across Visium experiments.
# PLOT QC FOR EACH SAMPLE
fig, axs = plt.subplots(len(slides), 4, figsize=(15, 4*len(slides)-4))
for i, s in enumerate(adata.obs['sample'].unique()):
#fig.suptitle('Covariates for filtering')
slide = select_slide(adata, s)
sns.distplot(slide.obs['total_counts'],
kde=False, ax = axs[i, 0])
axs[i, 0].set_xlim(0, adata.obs['total_counts'].max())
axs[i, 0].set_xlabel(f'total_counts | {s}')
sns.distplot(slide.obs['total_counts']\
[slide.obs['total_counts']<20000],
kde=False, bins=40, ax = axs[i, 1])
axs[i, 1].set_xlim(0, 20000)
axs[i, 1].set_xlabel(f'total_counts | {s}')
sns.distplot(slide.obs['n_genes_by_counts'],
kde=False, bins=60, ax = axs[i, 2])
axs[i, 2].set_xlim(0, adata.obs['n_genes_by_counts'].max())
axs[i, 2].set_xlabel(f'n_genes_by_counts | {s}')
sns.distplot(slide.obs['n_genes_by_counts']\
[slide.obs['n_genes_by_counts']<6000],
kde=False, bins=60, ax = axs[i, 3])
axs[i, 3].set_xlim(0, 6000)
axs[i, 3].set_xlabel(f'n_genes_by_counts | {s}')
plt.tight_layout()
Trying to set attribute `.uns` of view, copying. /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning) /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/seaborn/distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). warnings.warn(msg, FutureWarning)
Next, we show how to plot these QC values over the histology image using standard scanpy tools
slide = select_slide(adata, 'A1')
with mpl.rc_context({'figure.figsize': [6,7],
'axes.facecolor': 'white'}):
sc.pl.spatial(slide, img_key = "hires", cmap='magma',
library_id=list(slide.uns['spatial'].keys())[0],
color=['total_counts', 'n_genes_by_counts'], size=1,
gene_symbols='SYMBOL', show=False, return_fig=True)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
slide = select_slide(adata, 'A2')
with mpl.rc_context({'figure.figsize': [6,7],
'axes.facecolor': 'white'}):
sc.pl.spatial(slide, img_key = "hires", cmap='magma',
library_id=list(slide.uns['spatial'].keys())[0],
color=['total_counts', 'n_genes_by_counts'], size=1,
gene_symbols='SYMBOL', show=False, return_fig=True)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
slide = select_slide(adata, 'A1')
with mpl.rc_context({'figure.figsize': [12,14],
'axes.facecolor': 'black'}):
slide.obs["CCL19_CCL21_CXCL13"] = slide[:,slide.var['SYMBOL'].isin(["CCL19", "CCL21", "CXCL13"])].X.sum(1)
slide.obs["CCL19_CCL21_CXCL13_selected_15"] = np.array(slide.obs["CCL19_CCL21_CXCL13"] > 15).astype(float)
slide.obs["CCL19_CCL21_CXCL13_selected_5"] = np.array(slide.obs["CCL19_CCL21_CXCL13"] > 5).astype(float)
sc.pl.spatial(slide,
color=["CCL19", "CCL21", "CXCL13", "CCL19_CCL21_CXCL13",
'CCL19_CCL21_CXCL13_selected_15', 'CCL19_CCL21_CXCL13_selected_5'], #img_key=None, size=1,
vmin=0, cmap='magma', vmax='p99.5',
ncols=2,
gene_symbols='SYMBOL'
)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
slide = select_slide(adata, 'A2')
with mpl.rc_context({'figure.figsize': [12,14],
'axes.facecolor': 'black'}):
slide.obs["CCL19_CCL21_CXCL13"] = slide[:,slide.var['SYMBOL'].isin(["CCL19", "CCL21", "CXCL13"])].X.sum(1)
slide.obs["CCL19_CCL21_CXCL13_selected_15"] = np.array(slide.obs["CCL19_CCL21_CXCL13"] > 15).astype(float)
slide.obs["CCL19_CCL21_CXCL13_selected_5"] = np.array(slide.obs["CCL19_CCL21_CXCL13"] > 5).astype(float)
sc.pl.spatial(slide,
color=["CCL19", "CCL21", "CXCL13", "CCL19_CCL21_CXCL13",
'CCL19_CCL21_CXCL13_selected_15', 'CCL19_CCL21_CXCL13_selected_5'], #img_key=None, size=1,
vmin=0, cmap='magma', vmax='p99.5',
ncols=2,
gene_symbols='SYMBOL'
)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. /nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
slide = select_slide(adata, 'A2')
with mpl.rc_context({'figure.figsize': [12,14],
'axes.facecolor': 'black'}):
sc.pl.spatial(slide,
color=["CEACAM7"], #img_key=None, size=1,
vmin=0, cmap='magma', vmax='p99.5',
ncols=2,
gene_symbols='SYMBOL'
)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
Add counts matrix as adata.raw
adata_vis = adata.copy()
def remove_special_characters(adata, special_list=[' ', '\(', '\)', '\+', '\\\\', '/', '-', '\.', ';'],
replace='_'):
from re import sub
for s in special_list:
adata.var.columns = [sub(s, replace, c) for c in adata.var.columns]
adata.obs.columns = [sub(s, replace, c) for c in adata.obs.columns]
if adata.raw is not None:
adata.raw.var.columns = [sub(s, replace, c) for c in adata.raw.var.columns]
return adata
adata_vis_seu = adata_vis.copy()
adata_vis_seu = remove_special_characters(adata_vis_seu)
adata_vis_seu = remove_special_characters(adata_vis_seu, ['10X', '_10X'])
#adata_vis_seu.raw.var.drop(columns=adata_vis_seu.raw.var.columns, inplace=True)
adata_vis_seu.var_names = adata_vis_seu.var['SYMBOL']
adata_vis_seu.var.drop(columns=adata_vis_seu.var.columns, inplace=True)
adata_vis_seu.obs = adata_vis_seu.obs[['sample']]
del adata_vis_seu.obsm['mt']
del adata_vis_seu.obsp
del adata_vis_seu.uns
adata_vis_seu.write(f'{sp_data_folder}/sp_for_seurat_selected.h5ad')
adata_vis.raw = adata_vis
Now we apply the standard scanpy processing pipeline to the spatial Visium data to show experiment to experiment variability in the data. Importatly, this workflow will show the extent of batch differences in your data (cell2location works on samples jointly, see below).
In this mouse brain dataset, only a few regions should be different between sections because we are using 2 samples from biological replicates sectioned at a slightly different location along the anterior-posterior axis in the mouse brain. We see general alighnment of locations from both experiments and some mismatches, but as you will see in the downstream analysis notebook most of the differences between experiments here come from batch effect, which cell2location can account for.
adata_vis_plt = adata_vis.copy()
# Log-transform (log(data + 1))
sc.pp.log1p(adata_vis_plt)
# Find highly variable genes within each sample
adata_vis_plt.var['highly_variable'] = False
for s in adata_vis_plt.obs['sample'].unique():
adata_vis_plt_1 = adata_vis_plt[adata_vis_plt.obs['sample'].isin([s]), :]
sc.pp.highly_variable_genes(adata_vis_plt_1, min_mean=0.0125, max_mean=5, min_disp=0.5, n_top_genes=1000)
hvg_list = list(adata_vis_plt_1.var_names[adata_vis_plt_1.var['highly_variable']])
adata_vis_plt.var.loc[hvg_list, 'highly_variable'] = True
# Scale the data ( (data - mean) / sd )
sc.pp.scale(adata_vis_plt, max_value=10)
# PCA, KNN construction, UMAP
sc.tl.pca(adata_vis_plt, svd_solver='arpack', n_comps=40, use_highly_variable=True)
sc.pp.neighbors(adata_vis_plt, n_neighbors = 20, n_pcs = 40, metric='cosine')
sc.tl.umap(adata_vis_plt, min_dist = 0.3, spread = 1)
with mpl.rc_context({'figure.figsize': [8, 8],
'axes.facecolor': 'white'}):
sc.pl.umap(adata_vis_plt, color=['sample'], size=30,
color_map = 'RdPu', ncols = 1, #legend_loc='on data',
legend_fontsize=10)
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. Trying to set attribute `.uns` of view, copying. ... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
Next, we load the pre-processed snRNAseq reference anndata object that contains estimated expression signatures of reference cell types (see notebook 1/3).
## snRNAseq reference (raw counts)
adata_snrna_raw = sc.read(f'{reg_path}sc.h5ad')
adata_snrna_raw2 = anndata.read_h5ad(sc_data_folder + "FINAL_OBJECT_raw_nosoupx.h5ad")
adata_snrna_raw.obsm = adata_snrna_raw2[adata_snrna_raw.obs_names,:].obsm
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
with mpl.rc_context({'figure.figsize': [10, 10],
'axes.facecolor': 'white'}):
sc.pl.umap(adata_snrna_raw, color=['Integrated_05'], size=15,
color_map = 'RdPu', ncols = 1, legend_loc='on data',
legend_fontsize=10)
# find shared genes and subset both anndata and reference signatures
intersect = np.intersect1d(adata_vis.var_names, adata_snrna_raw.var_names)
adata_vis = adata_vis[:, intersect].copy()
adata_snrna_raw = adata_snrna_raw[:, intersect].copy()
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
from scvi.data import register_tensor_from_anndata
from scvi.external.stereoscope import RNAStereoscope, SpatialStereoscope
import scvi
rcParams['figure.figsize'] = 5, 5
adata_snrna_raw.raw.X.data
array([ 1., 1., 4., ..., 16., 3., 30.], dtype=float32)
adata_snrna_raw.layers["counts"] = adata_snrna_raw.raw.X.copy()
scvi.data.setup_anndata(adata_snrna_raw, layer = "counts", labels_key = "Integrated_05")
train = True
if train:
sc_model = RNAStereoscope(adata_snrna_raw)
sc_model.train(max_epochs = 4000, batch_size=1024)
sc_model.history["elbo_train"][2500:].plot()
sc_model.save(f"{scvi_run_name}scmodel", overwrite=True)
else:
adata_snrna_raw = RNAStereoscope.load(f"{scvi_run_name}scmodel", adata_snrna_raw)
print("Loaded RNA model from file!")
INFO No batch_key inputted, assuming all cells are same batch INFO Using labels from adata.obs["Integrated_05"] INFO Using data from adata.layers["counts"] INFO Computing library size prior per batch
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
INFO Successfully registered anndata object containing 153901 cells, 14388 vars, 1 batches, 76 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates. INFO Please do not further modify adata until model is trained.
GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 4000/4000: 100%|██████████| 4000/4000 [14:57:32<00:00, 13.46s/it, loss=4.3e+03, v_num=1]
adata_vis.layers["counts"] = adata_vis.X.copy()
scvi.data.setup_anndata(adata_vis, layer="counts")
INFO No batch_key inputted, assuming all cells are same batch INFO No label_key inputted, assuming all cells have same label INFO Using data from adata.layers["counts"] INFO Computing library size prior per batch INFO Successfully registered anndata object containing 3842 cells, 14388 vars, 1 batches, 1 labels, and 0 proteins. Also registered 0 extra categorical covariates and 0 extra continuous covariates. INFO Please do not further modify adata until model is trained.
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
train=True
if train:
spatial_model = SpatialStereoscope.from_rna_model(adata_vis, sc_model)
spatial_model.train(max_epochs = 40000, batch_size=1024)
spatial_model.history["elbo_train"][5000:].plot()
spatial_model.save(f"{scvi_run_name}stmodel", overwrite = True)
else:
spatial_model = SpatialStereoscope.load(f"{scvi_run_name}stmodel", adata_vis)
print("Loaded Spatial model from file!")
GPU available: True, used: True TPU available: False, using: 0 TPU cores LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 40000/40000: 100%|██████████| 40000/40000 [3:32:10<00:00, 3.14it/s, loss=2.25e+07, v_num=1]
import pandas as pd
adata_vis.obsm["decomposition"] = spatial_model.get_proportions()
for ct in adata_vis.obsm["decomposition"].columns:
adata_vis.obs[ct] = adata_vis.obsm["decomposition"][ct]
adata_vis.write(f"{scvi_run_name}stmodel/sp_40k_released.h5ad")
f"{scvi_run_name}stmodel/sp_40k_released.h5ad"
... storing 'Annotation' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical ... storing 'feature_types' as categorical ... storing 'genome' as categorical ... storing 'SYMBOL' as categorical
'/nfs/team205/vk7/sanger_projects/collaborations/fetal_gut_mapping/results/oxford/scvi_stereoscope_selected_optimised_long/stmodel/sp_40k_released.h5ad'
def plot_spatial_per_cell_type(adata,
cell_type=adata_vis.obsm["decomposition"].columns,
samples=['A1', 'A2'],
ncol=2):
n_samples = len(samples)
nrow = int(np.ceil(n_samples / ncol))
fig, axs = plt.subplots(nrow, ncol, figsize=(24, 8))
if nrow == 1:
axs = axs.reshape((1, ncol))
col_name = f'{cell_type}'
vmax = np.quantile(adata_vis.obs[col_name].values, 0.992)
#adata_vis.obs[cell_type] = adata_vis.obs[col_name].copy()
from itertools import chain
ind = list(chain(*[[(i, j) for i in range(nrow)] for j in range(ncol)]))
for i, s in enumerate(samples):
sp_data_s = select_slide(adata, s, s_col='sample')
sc.pl.spatial(sp_data_s, cmap='magma',
color=cell_type,
size=1.3, img_key='hires', alpha_img=1,
vmin=0, vmax=vmax, ax=axs[ind[i][0],ind[i][1]], show=False
)
axs[ind[i][0],ind[i][1]].title.set_text(cell_type+'\n'+s)
fig.tight_layout(pad=0.5)
return fig
plot_spatial_per_cell_type(adata_vis, cell_type=adata_vis.obsm["decomposition"].columns[0]);
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying. Trying to set attribute `.uns` of view, copying.
adata_vis.obsm["decomposition"].sum(1)
spot_id
A1_AAACAAGTATCTCCCA-1 1.0
A1_AAACAGAGCGACTCCT-1 1.0
A1_AAACATTTCCCGGATT-1 1.0
A1_AAACCACTACACAGAT-1 1.0
A1_AAACCCGAACGAAATC-1 1.0
...
A2_TTGTGTATGCCACCAA-1 1.0
A2_TTGTGTTTCCCGAAAG-1 1.0
A2_TTGTTGTGTGTCAAGA-1 1.0
A2_TTGTTTCACATCCAGG-1 1.0
A2_TTGTTTCCATACAACT-1 1.0
Length: 3842, dtype: float32
import os
decomposition = adata_vis.obs[adata_vis.obsm["decomposition"].columns]
decomposition = (decomposition.T / decomposition.sum(1)).T
adata_vis.obs.loc[:, adata_vis.obsm["decomposition"].columns] = decomposition
with mpl.rc_context({"axes.facecolor": "black"}):
clust_names = adata_vis.obsm["decomposition"].columns
for s in adata_vis.obs['sample'].unique():
s_ind = adata_vis.obs['sample'] == s
s_keys = list(adata_vis.uns['spatial'].keys())
s_spatial = np.array(s_keys)[[s in i for i in s_keys]][0]
fig = sc.pl.spatial(adata_vis[s_ind, :], cmap='magma',
color=clust_names, ncols=5, library_id=s_spatial,
size=1.3, img_key='hires', alpha_img=1,
vmin=0, vmax='p99.2',
return_fig=True, show=False)
fig_dir = f"{scvi_run_name}stmodel/spatial/"
if not os.path.exists(fig_dir):
os.mkdir(fig_dir)
fig_dir = f"{scvi_run_name}stmodel/spatial/per_sample/"
if not os.path.exists(fig_dir):
os.mkdir(fig_dir)
plt.savefig(f"{fig_dir}W_cell_proportion_{s}.png",
bbox_inches='tight')
plt.close()
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs)
We find regions by clustering spots based on inferred molecule contributions of each cell type. We use leiden clustering that incorporates both the similarity of spots in cell locations and in their proximity, by including both when computing the KNN graph. Results are saved in adata_vis.obs['leiden']. Since the clustering is done jointly, the cluster identities match between sections.
# compute KNN using the cell2location output
sc.pp.neighbors(adata_vis, use_rep='decomposition',
n_neighbors = 20, metric='correlation')
# Cluster spots into regions using scanpy
sc.tl.leiden(adata_vis, resolution=0.7)
adata_vis.obs["region_cluster"] = adata_vis.obs["leiden"]
adata_vis.obs["region_cluster"] = adata_vis.obs["region_cluster"].astype("category")
sc.tl.umap(adata_vis, min_dist = 0.2, spread = 1.5)
sc.settings.figdir = f"{scvi_run_name}stmodel/"
#del adata_vis.uns['region_cluster_colors']
rcParams['figure.figsize'] = 10, 10
rcParams["axes.facecolor"] = "white"
sc.pl.umap(adata_vis, color=['sample', 'region_cluster'],
color_map = 'RdPu', ncols = 2, legend_loc='on data',
legend_fontsize=10, size=20, save='umap_region_cluster.pdf')
WARNING: saving figure to file /nfs/team205/vk7/sanger_projects/collaborations/fetal_gut_mapping/results/oxford/scvi_stereoscope_selected_optimised_long/stmodel/umapumap_region_cluster.pdf
# Plot locations of clusters
rcParams["figure.figsize"] = [10, 10]
rcParams["axes.facecolor"] = "white"
for s in list(adata_vis.obs["sample"].unique()):
slide = select_slide(adata_vis, s)
sel_clust = adata_vis.obs['region_cluster'].cat.categories.isin(slide.obs['region_cluster'].cat.categories)
slide.uns['region_cluster_colors'] = list(np.array(adata_vis.uns['region_cluster_colors'])[sel_clust])
sc.pl.spatial(slide, #library_id=s,
color=["region_cluster"], img_key='hires', size=1,
#palette=adata_vis.uns['region_cluster_colors']
);
### Export regions for import to 10X Loupe Browser
# add binary labels for each region
region_cluster_bin = adata_vis.obs[['region_cluster']]
# save maps for each sample separately
sam = np.array(adata_vis.obs['sample'])
for i in np.unique(sam):
s1 = region_cluster_bin
s1 = s1.loc[sam == i]
s1.index = [x[10:] for x in s1.index]
s1.index.name = 'Barcode'
s1.to_csv(f"{scvi_run_name}stmodel/"\
+'/region_cluster29_' + i + '.csv')
/nfs/team283/vk7/software/miniconda3farm5/envs/scvi-env2/lib/python3.7/site-packages/pandas/core/arrays/categorical.py:2487: FutureWarning: The `inplace` parameter in pandas.Categorical.remove_unused_categories is deprecated and will be removed in a future version. res = method(*args, **kwargs) Trying to set attribute `.uns` of view, copying.
Trying to set attribute `.uns` of view, copying.
Modules and their versions used for this analysis
import sys
for module in sys.modules:
try:
print(module,sys.modules[module].__version__)
except:
try:
if type(sys.modules[module].version) is str:
print(module,sys.modules[module].version)
else:
print(module,sys.modules[module].version())
except:
try:
print(module,sys.modules[module].VERSION)
except:
pass
sys 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0] ipykernel 5.3.4 ipykernel._version 5.3.4 json 2.0.9 re 2.2.1 IPython 7.20.0 IPython.core.release 7.20.0 logging 0.5.1.2 zlib 1.0 traitlets 5.0.5 traitlets._version 5.0.5 argparse 1.1 ipython_genutils 0.2.0 ipython_genutils._version 0.2.0 platform 1.0.8 IPython.core.crashhandler 7.20.0 pygments 2.7.4 pexpect 4.8.0 ptyprocess 0.7.0 decorator 4.4.2 pickleshare 0.7.5 backcall 0.2.0 sqlite3 2.6.0 sqlite3.dbapi2 2.6.0 _sqlite3 2.6.0 prompt_toolkit 3.0.8 wcwidth 0.2.5 jedi 0.17.0 parso 0.8.1 colorama 0.4.4 ctypes 1.1.0 _ctypes 1.1.0 urllib.request 3.7 jupyter_client 6.1.7 jupyter_client._version 6.1.7 zmq 20.0.0 zmq.backend.cython 40303 zmq.backend.cython.constants 40303 zmq.sugar 20.0.0 zmq.sugar.constants 40303 zmq.sugar.version 20.0.0 jupyter_core 4.7.1 jupyter_core.version 4.7.1 tornado 6.1 _curses b'2.2' dateutil 2.8.1 dateutil._version 2.8.1 six 1.15.0 decimal 1.70 _decimal 1.70 distutils 3.7.9 scanpy 1.7.0 scanpy._metadata 1.7.0 packaging 20.9 packaging.__about__ 20.9 importlib_metadata 1.7.0 csv 1.0 _csv 1.0 numpy 1.20.0 numpy.version 1.20.0 numpy.core 1.20.0 numpy.core._multiarray_umath 3.1 numpy.lib 1.20.0 numpy.linalg._umath_linalg 0.1.5 scipy 1.6.0 scipy.version 1.6.0 anndata 0.7.5 anndata._metadata 0.7.5 h5py 3.1.0 h5py.version 3.1.0 cached_property 1.5.2 natsort 7.1.1 pandas 1.2.1 pytz 2021.1 pandas.compat.numpy.function 1.20.0 sinfo 0.3.1 stdlib_list v0.8.0 numba 0.52.0 yaml 5.3.1 llvmlite 0.35.0 pkg_resources._vendor.appdirs 1.4.3 pkg_resources.extern.appdirs 1.4.3 pkg_resources._vendor.packaging 20.4 pkg_resources._vendor.packaging.__about__ 20.4 pkg_resources.extern.packaging 20.4 pkg_resources._vendor.pyparsing 2.2.1 pkg_resources.extern.pyparsing 2.2.1 numba.misc.appdirs 1.4.1 sklearn 0.24.1 sklearn.base 0.24.1 joblib 1.0.0 joblib.externals.loky 2.9.0 joblib.externals.cloudpickle 1.6.0 scipy._lib.decorator 4.0.5 scipy.linalg._fblas b'$Revision: $' scipy.linalg._flapack b'$Revision: $' scipy.linalg._flinalg b'$Revision: $' scipy.special.specfun b'$Revision: $' scipy.ndimage 2.0 scipy.optimize.minpack2 b'$Revision: $' scipy.sparse.linalg.isolve._iterative b'$Revision: $' scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $' scipy.optimize._lbfgsb b'$Revision: $' scipy.optimize._cobyla b'$Revision: $' scipy.optimize._slsqp b'$Revision: $' scipy.optimize._minpack 1.10 scipy.optimize.__nnls b'$Revision: $' scipy.linalg._interpolative b'$Revision: $' scipy.integrate._odepack 1.9 scipy.integrate._quadpack 1.13 scipy.integrate._ode $Id$ scipy.integrate.vode b'$Revision: $' scipy.integrate._dop b'$Revision: $' scipy.integrate.lsoda b'$Revision: $' scipy.interpolate._fitpack 1.7 scipy.interpolate.dfitpack b'$Revision: $' scipy.stats.statlib b'$Revision: $' scipy.stats.mvn b'$Revision: $' sklearn.utils._joblib 1.0.0 leidenalg 0.8.3 igraph 0.8.3 texttable 1.6.3 igraph.version 0.8.3 matplotlib 3.3.4 pyparsing 2.4.7 cycler 0.10.0 kiwisolver 1.3.1 PIL 8.1.0 PIL._version 8.1.0 PIL.Image 8.1.0 xml.etree.ElementTree 1.3.0 cffi 1.14.4 tables 3.6.1 numexpr 2.7.2 numexpr.version 2.7.2 legacy_api_wrap 1.2 get_version 2.1 scvi 0.0.0 torch 1.8.1+cu102 torch.version 1.8.1+cu102 tarfile 0.9.0 torch.cuda.nccl 2708 torch.backends.cudnn 7605 tqdm 4.56.0 tqdm.cli 4.56.0 tqdm.version 4.56.0 tqdm._dist_ver 4.56.0 ipywidgets 7.6.3 ipywidgets._version 7.6.3 _cffi_backend 1.14.4 pycparser 2.20 pycparser.ply 3.9 pycparser.ply.yacc 3.10 pycparser.ply.lex 3.10 threadpoolctl 2.1.0 pyro 1.6.0 opt_einsum v3.3.0 pyro._version 1.6.0 pytorch_lightning 1.3.3 pytorch_lightning.__about__ 1.3.3 torchmetrics 0.2.0 deprecate 0.3.0 fsspec 2021.04.0 tensorboard 2.4.1 tensorboard.version 2.4.1 google.protobuf 3.14.0 tensorboard.compat.tensorflow_stub stub tensorboard.compat.tensorflow_stub.pywrap_tensorflow 0 seaborn 0.11.1 seaborn.external.husl 2.1.0 statsmodels 0.12.2 umap 0.5.0 pynndescent 0.5.1